Colombo Amenity Distribution: Point Pattern Analysis & Spatial Regression

Author

Nimesh Akalanka

Published

February 27, 2026

Introduction

This study explores the spatial distribution of education and healthcare facilities across the Colombo district in Sri Lanka. The analysis is structured into two main sections: Point Pattern Analysis and Spatial Regression Analysis.

User Stories

User Story 1

Prof. Jagath Munasinghe | Urban Planning Professor | University of Moratuwa

As an Urban Planning Professor at the University of Moratuwa, who researches spatial distribution of urban amenities across Sri Lankan cities,

I want to identify whether education and healthcare facilities has the same distribution pattern across Colombo district by measuring their spatial association at both interms of locations and GND scale,

So that I can determine whether Colombo’s urban amenity provision reflects a coordinated integrated planning effort or whether the two facility categories developed independently under different planning regimes, and use this evidence to contribute to academic debate on fragmented infrastructure development in Sri Lankan cities.


User Story 2

Plnr. Lasantha Perera | Infrastructure Planning Officer | Colombo City Council

As an Infrastructure Planning Officer at the Colombo City Council who is responsible for assessing the spatial adequacy of public facilities and advising on future infrastructure investments across the district,

I want to identify which factors among population density, building density, road density, and demographic age structure, are the most significantly influence the per capita distribution of education and healthcare facilities across Colombo district,

So that I can understand the structural drivers behind the current uneven facility distribution pattern and provide the city council with a quantitative evidence base for directing future education and healthcare investments to the GNDs that are most underserved relative to their population and infrastructure characteristics.


1 Data Loading & Preparation

These steps include the preparation of spatial data for the analysis of Colombo District. The data sources included in this analysis are as follows:

Data Source
Census Data Department of Census and Statistics
GND Boundaries Survey Department of Sri Lanka
Building Footprints OpenStreetMap (OSM)
Point of Interest (POI) Data OpenStreetMap (OSM)

The task involves loading relevant datasets, converting the coordinate system to EPSG:32644 for proper dimension and area calculations, and merging demographic data with building information. The goal is to calculate attributes related to population demographics, housing, road infrastructure, and building area.

  • Population Groups:
    • AG_0_14: Number of people aged 0-14.
    • AG_15_59: Number of people aged 15-59.
    • AG_60_64: Number of people aged 60-64.
    • AG_Above_65: Number of people aged above 65.
    • Male: Total number of males.
    • Female: Total number of females.
    • Total_Population: Total number of people.
  • Housing and Employment:
    • Occupied_Housing_Units: Number of occupied housing units.
    • Poverty_Percentage: Percentage of the population living in poverty.
    • Unemployment_Rate: Unemployment rate in the area.
  • Infrastructure:
    • Road_Density: Density of roads in the area.
    • Road_Length: Total length of roads in the area.
  • Calculations:
    • Population_Density: Population density per square kilometer.
    • Building_Area: Area of buildings (Square Meters)

Furthermore all the datasets are clipped in to the Colombo district boundary to ensure that the analysis is focused only for Colombo district.

# Load necessary libraries
library(sf)
library(dplyr)
library(tidyr)
library(ggplot2)
library(tmap)
library(terra)
library(spatstat)
library(spdep)
library(spatialreg)
library(knitr)
library(patchwork)
library(viridis)

# Improve figure clarity in rendered output
knitr::opts_chunk$set(
  fig.align = "center",
  fig.width = 9,
  fig.height = 6,
  dpi = 180,
  out.width = "100%"
)

# Select an projected coordinate system and set up the output directory
utm_crs <- 32644
output_dir <- "output"
dir.create(output_dir, recursive = TRUE, showWarnings = FALSE)
gpkg_path <- file.path(output_dir, "output_layers.gpkg")

# Load Sri Lankan GND boundaries
sl_gnds <- st_read("data/sl_gnds.shp", quiet = TRUE) %>%
  st_make_valid() %>%
  st_transform(utm_crs)

# Consider only Colombo district for this analysis
names(sl_gnds)
[1] "PROVINCE_N" "DISTRICT_N" "DSD_N"      "GND_N"      "geometry"  
unique(sl_gnds$DISTRICT_N)
 [1] "Kandy"        "Matale"       "Nuwara Eliya" "Ampara"       "Batticaloa"  
 [6] "Trincomalee"  "Anuradhapura" "Polonnaruwa"  "Kurunegala"   "Puttalam"    
[11] "Jaffna"       "Kilinochchi"  "Mannar"       "Mullaitivu"   "Vavuniya"    
[16] "Kegalle"      "Ratnapura"    "Galle"        "Hambantota"   "Matara"      
[21] "Badulla"      "Moneragala"   "Colombo"      "Gampaha"      "Kalutara"    
cmb_gnds <- sl_gnds %>% filter(DISTRICT_N == "Colombo")
cmb_boundary <- st_union(cmb_gnds)

# Read the census population data from census csv file
cmb_census <- read.csv("data/cmb_census_2024.csv")

# Join the census population data to the Colombo GNDs
# Consider both the GND and DSD names when Joining the attributes
cmb_gnds <- cmb_gnds %>%
  mutate(
    GND_N = trimws(toupper(GND_N)),
    DSD_N = trimws(toupper(DSD_N))
  ) %>%
  left_join(
    cmb_census %>%
      mutate(
        GND_Name = trimws(toupper(GND_Name)),
        DSD_Name = trimws(toupper(DSD_Name)),
        AG_0_14 = as.numeric(gsub(",", "", trimws(AG_0_14))),       
        AG_15_59 = as.numeric(gsub(",", "", trimws(AG_15_59))),
        AG_60_64 = as.numeric(gsub(",", "", trimws(AG_60_64))),
        AG_65_and_above = as.numeric(gsub(",", "", trimws(AG_65_and_above))),
        Male = as.numeric(gsub(",", "", trimws(Male))),
        Female = as.numeric(gsub(",", "", trimws(Female))),
        Total = as.numeric(gsub(",", "", trimws(Total))),
        Occupied_Housing_Units = as.numeric(gsub(",", "", trimws(Occupied_Housing_Units))),
        Poverty_Percentage = as.numeric(gsub(",", "", trimws(Poverty_Percentage))),
        Unemployment_Rate = as.numeric(gsub(",", "", trimws(Unemployment_Rate))),
        Road_Density = as.numeric(gsub(",", "", trimws(Road_Density))),
        Road_Length = as.numeric(gsub(",", "", trimws(Road_Length)))
      ) %>%
      select(
        DSD_Name, GND_Name, AG_0_14, AG_15_59, AG_60_64, AG_65_and_above, Male, Female, Total,
        Occupied_Housing_Units, Poverty_Percentage, Unemployment_Rate, Road_Density, Road_Length
      ),
    by = c("DSD_N" = "DSD_Name", "GND_N" = "GND_Name")
  ) %>%
  mutate(
    GND_N = tools::toTitleCase(tolower(GND_N)),
    DSD_N = tools::toTitleCase(tolower(DSD_N)),
    Area_sqkm = as.numeric(st_area(geometry)) / 1e6,           
    Pop_density_sqkm = Total / Area_sqkm
  )

# Check how many GNDs were successfully matched
cat("GNDs with population data:", sum(!is.na(cmb_gnds$Total)), "of", nrow(cmb_gnds), "\n")
GNDs with population data: 557 of 557 
# Load Sri Lankan Building footprints
cmb_build <- st_read("data/sl_buildings.shp", quiet = TRUE) %>%
  st_transform(utm_crs) %>% 
  st_intersection(cmb_boundary)

# Calculate total building footprint area per GND (Unit: square meters)
building_area_by_gnd <- cmb_build %>%
  mutate(building_area = as.numeric(st_area(.))) %>% 
  st_join(cmb_gnds %>% select(GND_N)) %>%  
  st_drop_geometry() %>%  
  group_by(GND_N) %>% 
  summarise(Building_area = sum(building_area, na.rm = TRUE), .groups = "drop")

# Join building area back to cmb_gnds 
cmb_gnds <- cmb_gnds %>%
  left_join(building_area_by_gnd, by = "GND_N") %>%
  mutate(Building_area = coalesce(Building_area, 0))

# Save the file to GeoPackage
st_write(cmb_build, gpkg_path, layer = "cmb_build", delete_layer = TRUE, quiet = TRUE)

# Show the information about the loaded buildings and POI data
cat("Number of GNDs loaded:", nrow(cmb_gnds), "\n")
Number of GNDs loaded: 557 

In this study mainly 02 amenity categories are considered in the OSM dataset under the fclass attribute:

  • Education Facilities:
    • school
    • university
    • college
    • kindergarten
  • Healthcare Facilites:
    • hospital
    • clinic
    • pharmacy
    • doctors
    • dentist
    • veterinary
    • nursing_home

Then other POI categories are filtered out and only the above two categories are kept for the study.

# Import Colombo POI data as points
cmb_poi <- st_read("data/sl_pois.shp", quiet = TRUE) %>%
  st_transform(utm_crs) %>%
  st_intersection(cmb_boundary)

# Inspect the POI attribute names
print(names(cmb_poi))
[1] "osm_id"   "code"     "fclass"   "name"     "geometry"
# Classify POIs based on fclass
cmb_poi <- cmb_poi %>%
  mutate(
    amenity_type = case_when(
      fclass %in% c("school", "university", "college", "kindergarten") ~ "education", 
      fclass %in% c("hospital", "clinic", "pharmacy", "doctors", "dentist", "veterinary", "nursing_home") ~ "healthcare",
      TRUE ~ NA_character_
    )) %>% filter(!is.na(amenity_type))

# Perform a spatial join to associate GND_N with POIs
cmb_poi <- st_join(cmb_poi, cmb_gnds %>% select(GND_N))

# Check if the GND_N is correctly added to the POI data
print(names(cmb_poi)) 
[1] "osm_id"       "code"         "fclass"       "name"         "amenity_type"
[6] "GND_N"        "geometry"    
head(cmb_poi)
Simple feature collection with 6 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 374626.5 ymin: 750124.3 xmax: 386507 ymax: 762313.2
Projected CRS: WGS 84 / UTM zone 44N
     osm_id code     fclass                                  name amenity_type
1 191815840 2110   hospital               Asiri Surgical Hospital   healthcare
2 253022237 2110   hospital       Joseph Fraser Memorial Hospital   healthcare
3 254367261 2110   hospital             Wethara District Hospital   healthcare
4 255179653 2110   hospital Sri Jayawardenapura teaching hospital   healthcare
5 255670316 2081 university     University of Sri Jayewardenepura    education
6 256542591 2082     school   Somaweera Chandrasiri Junior School    education
                GND_N                  geometry
1         Narahenpita POINT (376228.3 762236.9)
2     Thimbirigasyaya POINT (374626.5 762313.2)
3             Wethara   POINT (386507 750124.3)
4     Thalapathpitiya POINT (381352.6 759326.6)
5 Gangodavila South b   POINT (379121 757704.9)
6         Mampe North POINT (380988.8 752143.1)
# Count education and healthcare POIs per GND
poi_counts_by_gnd <- cmb_poi %>%
  st_drop_geometry() %>%
  group_by(GND_N, amenity_type) %>%
  summarise(count = n(), .groups = "drop") %>%
  pivot_wider(names_from = amenity_type, values_from = count, values_fill = list(count = 0)) %>%
  transmute(
    GND_N,
    education_count = as.integer(education),
    healthcare_count = as.integer(healthcare)
  )

# Check the resulting table
print(names(poi_counts_by_gnd))
[1] "GND_N"            "education_count"  "healthcare_count"
print(head(poi_counts_by_gnd))
# A tibble: 6 × 3
  GND_N          education_count healthcare_count
  <chr>                    <int>            <int>
1 Akaravita                    1                0
2 Aluth Ambalama               1                0
3 Aluthkade East               2                0
4 Aluthmawatha                 5                1
5 Asiri Uyana                  0                4
6 Athurugiriya                 1                1
# Join the POI category counts with density, count per 1000 people
cmb_gnds <- cmb_gnds %>%
  select(-any_of(c("education", "healthcare", "education_count", "healthcare_count",
                   "education_density", "healthcare_density",
                   "education_per1000", "healthcare_per1000"))) %>%
  left_join(poi_counts_by_gnd, by = "GND_N") %>%
  mutate(
    education_count    = coalesce(education_count, 0L),
    healthcare_count   = coalesce(healthcare_count, 0L),
    education_density  = education_count / Area_sqkm,
    healthcare_density = healthcare_count / Area_sqkm,
    education_per1000  = (education_count / Total) * 1000,
    healthcare_per1000 = (healthcare_count / Total) * 1000
  )

# Save the classified POIs to GeoPackage
st_write(cmb_poi, gpkg_path, layer = "cmb_poi", delete_layer = TRUE, quiet = TRUE)
st_write(cmb_gnds, gpkg_path, layer = "cmb_gnds", delete_layer = TRUE, quiet = TRUE)

2 Visualize the POI Distribution

Visualize the two amenity categories (educational and healthcare) across Colombo.

Code
# Split the data based on amenity categories
education <- cmb_poi %>% filter(amenity_type == "education")
healthcare <- cmb_poi %>% filter(amenity_type == "healthcare")

# Define a function to create maps
make_map <- function(pts, title, dot_color) {
  ggplot() +
    geom_sf(data = cmb_gnds, fill = "#f7f7f7", color = "#d9d9d9", linewidth = 0.25) +
    geom_sf(data = st_union(cmb_gnds), fill = NA, color = "#333333", linewidth = 0.6) +
    geom_sf(
      data = pts,
      color = dot_color,
      size = 5,
      alpha = 0.75,
      shape = 16
    ) +
    labs(
      title = title,
      subtitle = paste0("n = ", nrow(pts), " points")
    ) +
    theme_void(base_size = 12) +
    theme(
      plot.title = element_text(
        face = "bold",
        hjust = 0.5, 
        size = 14,
        margin = margin(b = 3)
      ),
      plot.subtitle   = element_text(size = 10, color = "#5f5f5f", hjust = 0.5, margin = margin(b = 8)), 
      plot.background = element_rect(fill = "white", color = NA),
      plot.margin     = margin(10, 10, 10, 10)
    ) +
    coord_sf(expand = FALSE)
}

# Plot the map
m1 <- make_map(education, "Educational Facilities", "#2166ac")
m2 <- make_map(healthcare,  "Healthcare Facilities",  "#d6604d")

m1 / m2 +
  plot_annotation(
    title   = "Amenity Distribution across Colombo District",
    caption = "Source: OpenStreetMap | CRS: UTM Zone 44N",
    theme   = theme(
      plot.title   = element_text(face = "bold", size = 16, hjust = 0.5),
      plot.caption = element_text(color = "grey50", size = 9, hjust = 1)
    )
  )


3 Amenity Distribution Analysis

Are the educational, healthcare facilities in Colombo distributed randomly, clustered or regularly distributed? Since spatstat requries a observation window, Colombo boundary is used as the observation window.

3.1 Quadrat Count Analysis

Code
# Defining the observation window for the amenity categories
education_ppp <- as.ppp(education, W = as.owin(cmb_boundary))
healthcare_ppp <- as.ppp(healthcare, W = as.owin(cmb_boundary))

# Test the quadrant count for each amenity
education_quadrant <- quadratcount(education_ppp, nx = 10, ny = 10)
healthcare_quadrant <- quadratcount(healthcare_ppp, nx = 10, ny = 10)

# Plot the point distribution in the quardrants
op <- par(no.readonly = TRUE)
par(mfrow = c(2, 1), mar = c(3.8, 3.8, 3, 1.5), bg = "white")

plot(education_quadrant, main = "Educational Facilities | Quadrat Count", col = viridis(10), legend = TRUE, main.cex = 1.2)
plot(st_geometry(education), add = TRUE, col = adjustcolor("grey70", alpha.f = 0.4), pch = 16)
plot(as.owin(cmb_boundary), add = TRUE, border = "grey20", lwd = 1)

plot(healthcare_quadrant, main = "Healthcare Facilities | Quadrat Count", col = viridis(10),legend = TRUE, main.cex = 1.2)
plot(st_geometry(healthcare), add = TRUE, col = adjustcolor("grey70", alpha.f = 0.4), pch = 16)
plot(as.owin(cmb_boundary), add = TRUE, border = "grey20", lwd = 1)

Code
par(op)

As the quadrat count maps shows that both educational and healthcare facilities are not evenly distributed. In terms of both of these amenity categories majority of the amenities are located towards the western part of the district, which is higher urbanized area of the district.

3.2 Chi-squared Test for Complete Spatial Randomness (CSR)

Let’s do a chi-squared test to see if the observed distribution of amenities in the quadrants follow a complete spatial randomness (CSR) pattern or do they show clustering or regularity.

# Test the quadrat counts for each amenity category
quadrat.test(education_quadrant)

    Chi-squared test of CSR using quadrat counts

data:  
X2 = 2078.4, df = 77, p-value < 2.2e-16
alternative hypothesis: two.sided

Quadrats: 78 tiles (irregular windows)

p-value is less than 0.05 (if we consider a significance of 5%), reject the null hypothesis of complete spatial randomness (CSR) for educational facilities, indicating that they are not randomly distributed across Colombo. The observed distribution is likely to be clustered or regular. The magnitude of the test statistic says a strong evidence against randomness, indicates an extremely uneven distribution, where educational facilities are heavily concentrated in a small number of quadrats while large portions of Colombo have very few or none at all.

# Test the quadrat counts for each amenity category
quadrat.test(healthcare_quadrant)

    Chi-squared test of CSR using quadrat counts

data:  
X2 = 808.99, df = 77, p-value < 2.2e-16
alternative hypothesis: two.sided

Quadrats: 78 tiles (irregular windows)

The p-value is less than 0.05 (at a 5% significance level), we reject the null hypothesis of Complete Spatial Randomness (CSR) for healthcare facilities, indicating that they are not randomly distributed across Colombo. The observed distribution is likely to be clustered or regular. Although the test statistic (X² = 808.99) is considerably smaller than that of educational facilities, it still constitutes strong evidence against randomness, indicating an uneven distribution where healthcare facilities are concentrated in certain quadrats while other parts of Colombo remain comparatively underserved. This points to significant spatial clustering of healthcare facilities across the district.

3.3 Density Analysis

Perform a kernel density estimation to visualize the intensity of the given amenity categories see their visual differences in terms of spatial distribution across Colombo.

Code
# Calculate the kernel density
education_dens <- density(education_ppp, sigma = bw.diggle)
healthcare_dens <- density(healthcare_ppp, sigma = bw.diggle)

# Plot the kernel density
op <- par(no.readonly = TRUE)
par(mfrow = c(2, 1), mar = c(3.8, 3.8, 3, 1.5), bg = "white")

plot(education_dens, main = "Educational Facilities KDE", col = viridis(100))
plot(as.owin(cmb_boundary), add = TRUE, border = "grey20", lwd = 1)

plot(healthcare_dens, main = "Healthcare Facilities KDE", col = viridis(100))
plot(as.owin(cmb_boundary), add = TRUE, border = "grey20", lwd = 1)

Code
par(op)

# Plot the GND wise Density Map
p1 <- ggplot() +
  geom_sf(data = cmb_gnds, aes(fill = education_density),
          color = "#d9d9d9", linewidth = 0.15) +
  geom_sf(data = st_union(cmb_gnds), fill = NA,
          color = "#333333", linewidth = 0.6) +
  scale_fill_viridis_c(
    option    = "viridis",
    name      = "Facilities\nper km²",
    na.value  = "grey90",
    trans     = "sqrt"
  ) +
  labs(
    title    = "Educational Facility Density",
    subtitle = "Facilities per km² by GND"
  ) +
  theme_void(base_size = 12) +
  theme(
    plot.title      = element_text(face = "bold", size = 14, hjust = 0,
                                   margin = margin(b = 3)),
    plot.subtitle   = element_text(size = 10, color = "#5f5f5f", hjust = 0,
                                   margin = margin(b = 8)),
    plot.background = element_rect(fill = "white", color = NA),
    plot.margin     = margin(10, 10, 10, 10),
    legend.position = "right"
  ) +
  coord_sf(expand = FALSE)

p2 <- ggplot() +
  geom_sf(data = cmb_gnds, aes(fill = healthcare_density),
          color = "#d9d9d9", linewidth = 0.15) +
  geom_sf(data = st_union(cmb_gnds), fill = NA,
          color = "#333333", linewidth = 0.6) +
  scale_fill_viridis_c(
    option    = "viridis",
    name      = "Facilities\nper km²",
    na.value  = "grey90",
    trans     = "sqrt"
  ) +
  labs(
    title    = "Healthcare Facility Density",
    subtitle = "Facilities per km² by GND"
  ) +
  theme_void(base_size = 12) +
  theme(
    plot.title      = element_text(face = "bold", size = 14, hjust = 0,
                                   margin = margin(b = 3)),
    plot.subtitle   = element_text(size = 10, color = "#5f5f5f", hjust = 0,
                                   margin = margin(b = 8)),
    plot.background = element_rect(fill = "white", color = NA),
    plot.margin     = margin(10, 10, 10, 10),
    legend.position = "right"
  ) +
  coord_sf(expand = FALSE)

p1 / p2 +
  plot_annotation(
    title   = "GND wise Facility Density across Colombo District",
    caption = "Source: OpenStreetMap | Area-adjusted counts | CRS: UTM Zone 44N",
    theme   = theme(
      plot.title   = element_text(face = "bold", size = 16, hjust = 0.5),
      plot.caption = element_text(color = "grey50", size = 9, hjust = 1)
    )
  )

This aggregation towards the western part of the colombo district of both educational and healthcare facilities is clear when looking at the pixel and GND wise density maps.

3.4 Selecting an Appropriate Correlation Method

Let’s see is the distribution of educational and healthcare facilities are similar accross Colombo district. First check that what type of distribution the KDE values of both categories follw.

Code
set.seed(13244)

# Convert spatstat density objects to terra rasters
education_rast  <- rast(education_dens)
healthcare_rast <- rast(healthcare_dens)

# Resample healthcare raster to match education raster grid exactly
healthcare_rast_resampled <- resample(healthcare_rast, education_rast)

# Extract raster cell values
education_vals  <- values(education_rast)
healthcare_vals <- values(healthcare_rast_resampled)

# Remove NA cells (cells outside the observation window)
valid <- !is.na(education_vals) & !is.na(healthcare_vals)

# Shapiro-Wilk requires n <= 5000
sample_idx <- sample(which(valid), min(5000, sum(valid)))

# Plot the histogram to visualize the distribution of intensity values for both categories
hist(education_vals[valid],
     breaks = 50,
     main   = "Education KDE",
     xlab   = "Intensity",
     col    = "#2166ac80",
     border = "white")

Code
# Test on raw values of education
shapiro_edu <- shapiro.test(education_vals[sample_idx])
cat("Education KDE:    W =", round(shapiro_edu$statistic, 4), "| p =", shapiro_edu$p.value, "\n")
Education KDE:    W = 0.3716 | p = 3.614476e-85 

Since the p-value of the Shapiro-Wilk test is 3.614476e-85, which is significantly less than 0.05, null hypothesis of normality distribution can be rejected. This indicates that the distribution of educational facility intensity across Colombo significantly deviates from normality, suggesting a non-normal distribution.

Code
# Plot the histogram
hist(healthcare_vals[valid],
     breaks = 50,
     main   = "Healthcare KDE",
     xlab   = "Intensity",
     col    = "#d6604d80",
     border = "white")

Code
# Shapiro Wilk test for the KDE values of healthcare facilities
shapiro_hlth <- shapiro.test(healthcare_vals[sample_idx])
cat("Healthcare KDE:   W =", round(shapiro_hlth$statistic, 4), "| p =", shapiro_hlth$p.value, "\n")
Healthcare KDE:   W = 0.2859 | p = 4.798521e-88 

Also the healthcare KDE values have p-value of p = 4.798521e-88, which is significantly less than 0.05, which says that the healthcare facility intensity distribution across Colombo also significantly deviates from a normal distribution, indicating a non-normal distribution.

Both educational and healthcare facilities show non-normal distributions in their spatial intensity across the district. Therefore using the Pearson R correlation coefficient to measure the relationship between the two categories may not be appropriate.

3.5 Correlation Analysis

Since both educational and healthcare facility intensity distributions are confirmed to be non-normal, Spearman’s rank correlation is used as the primary measure of spatial association. The analysis is conducted at two spatial scales to provide a comprehensive understanding of the relationship between the two amenity types across Colombo.

Code
# Pixel wise KDE Correlation
# Spearman's rank correlation test
spearman_kde <- cor.test(
  education_vals[valid],
  healthcare_vals[valid],
  method = "spearman"
)

# GND wise KDE Correlation
# Spearman's rank correlation test
spearman_gnd <- cor.test(
  cmb_gnds$education_density,
  cmb_gnds$healthcare_density,
  method = "spearman"
)

# Summary of the correlation results
cat("Pixel wise Spearman Correlation rho:", round(spearman_kde$estimate, 4), "\n")
Pixel wise Spearman Correlation rho: 0.5544 
Code
cat("GND wise Spearman Correlation rho:", round(spearman_gnd$estimate, 4), "\n")
GND wise Spearman Correlation rho: 0.3229 
Scale rho Strength
Pixel wise KDE 0.554 Moderate positive
GND wise Density 0.323 Week positive

Both scales reveal a positive but moderate relationship between education and healthcare facility intensity across Colombo district. At the pixel level (ρ = 0.554), areas of high educational facility concentration tend to coincide with areas of high healthcare facility concentration, suggesting some degree of co-location at these scales. When aggregated to the GND level (ρ = 0.323), the association weakens, indicating that the co-location pattern is more pronounced at local pixel scale than at the GND scale.

In order to understand underline characteristics of the distribution of amenities and answer the question of which factors has contributed to the current state of these amenty distribution pattern spatial regression analysis needs to be conducted, which is the next step of this analysis.


4 Spatial Regression Analysis

I want to understand the factors that influence the distribution of education and healthcare facilities across Colombo district in terms of GND level attributes.

4.1 Data Exploration and Preparation

First see the distribution of education and healthcare facilities per 1000 people across GNDs

Code
# Define break points and labels
edu_breaks <- c(0, 0.001, 0.5, 1.0, 2.0, max(cmb_gnds$education_per1000, na.rm = TRUE))
hlt_breaks <- c(0, 0.001, 0.5, 1.0, 2.0, max(cmb_gnds$healthcare_per1000, na.rm = TRUE))

# Plot the GND wise map for education and healthcare facilities per 1000 people
p1 <- ggplot() +
  geom_sf(data = cmb_gnds, aes(fill = education_per1000), color = "#d9d9d9", linewidth = 0.15) +
  geom_sf(data = st_union(cmb_gnds), fill = NA,color = "#333333", linewidth = 0.6) +
  scale_fill_stepsn(
    colours = RColorBrewer::brewer.pal(5, "Blues"),
    breaks = c(0.001, 0.5, 1.0, 2.0),
    limits = c(0, max(cmb_gnds$education_per1000, na.rm = TRUE)),
    labels = c("Very Low (<0.5)", "Low (0.5–1)", "Medium (1–2)", "High (>2)"),
    name = "Education Facilities\nper 1000 people",
    na.value = "grey90",
    guide = guide_coloursteps(show.limits = FALSE)
  ) +
  labs(title = "Education Facilities per 1000 People") +
  theme_void(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5, margin = margin(b = 3)),
    plot.title.position = "plot", 
    plot.background = element_rect(fill = "white", color = NA),
    plot.margin = margin(10, 10, 10, 10),
    legend.position = "right"
  ) +
  coord_sf(expand = FALSE)

p2 <- ggplot() +
  geom_sf(data = cmb_gnds, aes(fill = healthcare_per1000), color = "#d9d9d9", linewidth = 0.15) +
  geom_sf(data = st_union(cmb_gnds), fill = NA, color = "#333333", linewidth = 0.6) +
  scale_fill_stepsn(
    colours = RColorBrewer::brewer.pal(5, "Reds"),
    breaks = c(0.001, 0.5, 1.0, 2.0),
    limits = c(0, max(cmb_gnds$healthcare_per1000, na.rm = TRUE)),
    labels = c("Very Low (<0.5)", "Low (0.5–1)", "Medium (1–2)", "High (>2)"),
    name = "Healthcare Facilities\nper 1000 people",
    na.value = "grey90",
    guide = guide_coloursteps(show.limits = FALSE)
  ) +
  labs(title = "Healthcare Facilities per 1000 People") +
  theme_void(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5,margin = margin(b = 3)),
    plot.title.position = "plot",
    plot.background = element_rect(fill = "white", color = NA),
    plot.margin = margin(10, 10, 10, 10),
    legend.position = "right"
  ) +
  coord_sf(expand = FALSE)

p1 / p2 +
  plot_layout(heights = c(1, 1)) +
  plot_annotation(
    title   = "Amenity Accessibility across Colombo District",
    caption = "Source: OpenStreetMap & Census 2024 | CRS: UTM Zone 44N",
    theme   = theme(
      plot.title   = element_text(face = "bold", size = 16, hjust = 0.5),
      plot.caption = element_text(color = "grey50", size = 9, hjust = 1)
    )
  ) &
  theme(
    plot.margin = margin(20, 10, 20, 10)
  )

Let’s prepare regression datasets for both educational and healthcare facilities. Since spatial models assume linearity log transformations are applied to skewed data such as population density, road density and occupied housing units.

Additionally, demographic percentages are calculated to capture the age structure of the population and also for the built up area, which can influence the demand for the type of amenities this study considers.

# Prepare the regression dataset
cmb_reg <- cmb_gnds %>%
  filter(
    !is.na(Total), Total > 0,
    !is.na(Pop_density_sqkm),
    !is.na(Poverty_Percentage),
    !is.na(Unemployment_Rate),
    !is.na(Road_Density),
    !is.na(Building_area),
    !is.na(Occupied_Housing_Units)
  ) %>%
  mutate(
    log_pop_density = log1p(Pop_density_sqkm),
    log_road_density = log1p(Road_Density),
    log_occ_housing = log1p(Occupied_Housing_Units),
    pct_builtup_area = (Building_area / Area_sqkm * 100),
    pct_builtup_area = scale(pct_builtup_area)[,1],
    pct_young = (AG_0_14 / Total) * 100,
    pct_elderly = (AG_65_and_above / Total * 100),
    pct_working_age = ((AG_15_59 + AG_60_64) / Total * 100)
  )

# Check the number of GNDs in the Regression dataset
cat("GNDs for regression:", nrow(cmb_reg), "\n")
GNDs for regression: 556 

Other than 1 GND with all the rest of the 556 GNDs has been considered for this regression analysis.

4.2 Build Spatial Weights Matrix

To analyze the spatial relationships, definition of spatial weights matrix is needed. In this study a queen contiguity-based spatial weights matrix is considered because this is an administrative boundary which any of the adjacentGNDs can be considered as a neighbor and influence each other.

Code
# Create a spatial weights matrix using queen contiguity
nb_queen <- poly2nb(cmb_reg, queen = TRUE)
lw_queen <- nb2listw(nb_queen, style = "W", zero.policy = TRUE)

# Check the neighbour structure
cat(
  "Total GNDs:", length(nb_queen), "\n",
  "Average number of neighbours:", round(mean(card(nb_queen)), 2), "\n",
  "GNDs with no neighbours:", sum(card(nb_queen) == 0), "\n",
  "Min neighbours:", min(card(nb_queen)), "\n",
  "Max neighbours:", max(card(nb_queen)), "\n"
)
Total GNDs: 556 
 Average number of neighbours: 5.58 
 GNDs with no neighbours: 0 
 Min neighbours: 1 
 Max neighbours: 13 

This spatial weights matrix contains 556 GNDs with an average of 5.58 neighbours per GND. There is no GND with zero neighbours, which means all GNDs have at least one adjacent GND. The maximum number of neighbours is 13.

4.3 Define Spatial Regression Formulas

The 02 formulas for spatial regression models are defined with same set of independent variables to analyze the factors influencing the distribution of education and healthcare facilities across Colombo district.

# Formular for education facilities per 1000 people
formula_education <- formula(
  education_per1000 ~ log_pop_density + Poverty_Percentage +
    Unemployment_Rate + log_road_density + log_occ_housing + pct_builtup_area + 
    pct_young + pct_elderly
)

# Formular for healthcare facilities per 1000 people
formula_healthcare <- formula(
  healthcare_per1000 ~ log_pop_density + Poverty_Percentage +
    Unemployment_Rate + log_road_density + log_occ_housing + pct_builtup_area + 
    pct_young + pct_elderly
)

4.4 OLS Baseline Models

First, OLS regression models are fitted for both educational and healthcare facilities to establish a baseline understanding of the relationships between the independent variables and the dependent variables without considering the spatial autocorrelation.

Code
# OLS regression models for both education and healthcare facilities
ols_education <- lm(formula_education, data = cmb_reg)
ols_healthcare <- lm(formula_healthcare, data = cmb_reg)

# Display the summary of the OLS models
summary(ols_education)

Call:
lm(formula = formula_education, data = cmb_reg)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.7911 -0.1801 -0.0663  0.0615  3.6620 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        -0.237418   0.332010  -0.715 0.474857    
log_pop_density    -0.100105   0.025994  -3.851 0.000132 ***
Poverty_Percentage -0.005566   0.006600  -0.843 0.399381    
Unemployment_Rate  -0.027688   0.020233  -1.368 0.171721    
log_road_density    0.121649   0.029300   4.152 3.83e-05 ***
log_occ_housing     0.064911   0.035163   1.846 0.065429 .  
pct_builtup_area    0.098181   0.020402   4.812 1.93e-06 ***
pct_young           0.020296   0.009490   2.139 0.032910 *  
pct_elderly         0.026996   0.008022   3.365 0.000818 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.375 on 547 degrees of freedom
Multiple R-squared:  0.1503,    Adjusted R-squared:  0.1378 
F-statistic: 12.09 on 8 and 547 DF,  p-value: 5.724e-16
Code
summary(ols_healthcare)

Call:
lm(formula = formula_healthcare, data = cmb_reg)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.66111 -0.15761 -0.04949  0.06864  2.67199 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)         1.693721   0.283507   5.974 4.17e-09 ***
log_pop_density    -0.075233   0.022197  -3.389 0.000751 ***
Poverty_Percentage  0.003195   0.005635   0.567 0.571011    
Unemployment_Rate  -0.027336   0.017277  -1.582 0.114173    
log_road_density    0.121061   0.025020   4.839 1.70e-06 ***
log_occ_housing    -0.056503   0.030026  -1.882 0.060392 .  
pct_builtup_area    0.055878   0.017422   3.207 0.001418 ** 
pct_young          -0.035467   0.008104  -4.377 1.44e-05 ***
pct_elderly        -0.011858   0.006850  -1.731 0.083996 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3202 on 547 degrees of freedom
Multiple R-squared:  0.2056,    Adjusted R-squared:  0.194 
F-statistic:  17.7 on 8 and 547 DF,  p-value: < 2.2e-16

OLS baseline model shows that for education facilities per 1000 people these factors considered in the formular accounts for only 15%, for the healthcare facilities per 1000 people the R-squared value is 20.56, which means each models only explains about 15%, 20.56% of the changes in education and healthcare facilities distribution across GNDs with the given factors respectively.

4.5 Moran’s I Test on OLS Residuals

Then run the Moran’s I test on the residuals of the OLS models to see if there is any spatial autocorrelation present. If there is significant spatial autocorrelation in the residuals, it shows that the OLS model doesnt fully capture the spatial processes underline with the distribution of the education and healthcare amenities.

# Moran's I test on OLS residuals to check for spatial autocorrelation
moran_education <- lm.morantest(ols_education, lw_queen, zero.policy = TRUE)

# Display the results of Moran's I test
moran_education

    Global Moran I for regression residuals

data:  
model: lm(formula = formula_education, data = cmb_reg)
weights: lw_queen

Moran I statistic standard deviate = 10.002, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Observed Moran I      Expectation         Variance 
    0.2503638865    -0.0079900879     0.0006671923 

The Moran’s I test results shows significant spatial autocorrelation in the education facilities per 1000 people value of 0.25, which tell that there is a unexplained variation in the distribution of education facilities across GNDs.

# Moran's I test on OLS residuals to check for spatial autocorrelation
moran_healthcare <- lm.morantest(ols_healthcare, lw_queen, zero.policy = TRUE)

# Display the results of Moran's I test
moran_healthcare

    Global Moran I for regression residuals

data:  
model: lm(formula = formula_healthcare, data = cmb_reg)
weights: lw_queen

Moran I statistic standard deviate = 3.1514, p-value = 0.0008126
alternative hypothesis: greater
sample estimates:
Observed Moran I      Expectation         Variance 
    0.0734097026    -0.0079900879     0.0006671923 

Healthcare facilities show week clustering Moran’s I = 0.073, p < 0.001 confirming that even though the spatial structure is weaker, it is not negligible and cannot be attributed to chance.

In both cases, the null hypothesis of spatial randomness is rejected, meaning OLS assumptions are violated and spatial regression models are necessary for reliable inference.

4.6 Lagrange Multiplier Diagnostics

In order to identify which type of spatial model should be used for this study, Lagrange Multiplier (LM) diagnostics are performed on the OLS residuals.

Education Facilities LM Diagnostics

Code
lm_diag_education <- lm.LMtests(
  ols_education, lw_queen,
  test = c("LMlag", "LMerr", "RLMlag", "RLMerr"),
  zero.policy = TRUE
)

lm_diag_education

    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_education, data = cmb_reg)
test weights: listw

RSlag = 82.552, df = 1, p-value < 2.2e-16


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_education, data = cmb_reg)
test weights: listw

RSerr = 90.572, df = 1, p-value < 2.2e-16


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_education, data = cmb_reg)
test weights: listw

adjRSlag = 0.017371, df = 1, p-value = 0.8951


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_education, data = cmb_reg)
test weights: listw

adjRSerr = 8.0371, df = 1, p-value = 0.004583
Test Statistic p-value
RSlag 82.552 <2.2e-16
RSerr 90.572 <2.2e-16
adjRSlag 0.017 0.8951
adjRSerr 8.037 0.004583

The robust diagnostics clearly indicate that spatial dependence in education facility distribution operates through the error term. The adjusted spatial error test (adjRSerr = 8.037, p = 0.005) is significant while the adjusted spatial lag test (adjRSlag = 0.017, p = 0.895) is not, saying that the Spatial Error Model is suited for the education facilities.

Healthcare Facilities LM Diagnostics

Code
lm_diag_healthcare <- lm.LMtests(
  ols_healthcare, lw_queen,
  test = c("LMlag", "LMerr", "RLMlag", "RLMerr"),
  zero.policy = TRUE
)

lm_diag_healthcare

    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_healthcare, data = cmb_reg)
test weights: listw

RSlag = 3.4042, df = 1, p-value = 0.06503


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_healthcare, data = cmb_reg)
test weights: listw

RSerr = 7.7867, df = 1, p-value = 0.005263


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_healthcare, data = cmb_reg)
test weights: listw

adjRSlag = 3.7554, df = 1, p-value = 0.05264


    Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
    dependence

data:  
model: lm(formula = formula_healthcare, data = cmb_reg)
test weights: listw

adjRSerr = 8.138, df = 1, p-value = 0.004335
Test Statistic p-value
RSlag 3.404 0.06503
RSerr 7.787 0.005263
adjRSlag 3.755 0.05264
adjRSerr 8.138 0.004335

The robust diagnostics similarly point to a Spatial Error Model for healthcare facilities. The adjusted spatial error test (adjRSerr = 8.138, p = 0.004) is significant while the adjusted spatial lag test (adjRSlag = 3.755, p = 0.053) narrowly fails to reach significance, confirming that the spatial clustering in healthcare distribution is driven by unobserved spatially structured background factors rather than direct spillover effects between neighbouring GNDs.

4.7 Spatial Error Models (SEM)

Since the residual autocorrelation is positive for both models, a spatial error model is fitted to account for spatial autocorrelation in the error terms, which is likely due to unobserved spatially structured factors influencing the distribution of education and healthcare facilities across GNDs.

Education SEM

Code
sem_education <- errorsarlm(
  formula_education,
  data = cmb_reg,
  listw = lw_queen,
  zero.policy = TRUE
)



summary(sem_education)

Call:errorsarlm(formula = formula_education, data = cmb_reg, listw = lw_queen, 
    zero.policy = TRUE)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.820641 -0.159433 -0.051457  0.060284  3.576406 

Type: error 
Coefficients: (asymptotic standard errors) 
                     Estimate Std. Error z value  Pr(>|z|)
(Intercept)        -0.3311486  0.3342088 -0.9908 0.3217621
log_pop_density    -0.0786423  0.0275889 -2.8505 0.0043650
Poverty_Percentage -0.0064335  0.0078839 -0.8160 0.4144858
Unemployment_Rate  -0.0013160  0.0216732 -0.0607 0.9515832
log_road_density    0.0981049  0.0267494  3.6676 0.0002449
log_occ_housing     0.0553318  0.0339417  1.6302 0.1030589
pct_builtup_area    0.1315490  0.0235414  5.5880 2.297e-08
pct_young           0.0125422  0.0097254  1.2896 0.1971765
pct_elderly         0.0291283  0.0084875  3.4319 0.0005993

Lambda: 0.45249, LR test value: 69.254, p-value: < 2.22e-16
Asymptotic standard error: 0.05217
    z-value: 8.6733, p-value: < 2.22e-16
Wald statistic: 75.227, p-value: < 2.22e-16

Log likelihood: -204.4203 for error model
ML residual variance (sigma squared): 0.11694, (sigma: 0.34196)
Number of observations: 556 
Number of parameters estimated: 11 
AIC: 430.84, (AIC for lm: 498.09)
Predictor Estimate Std. Error z value p-value
(Intercept) -0.3311 0.3342 -0.991 0.322
log_pop_density -0.0786 0.0276 -2.851 0.004
Poverty_Percentage -0.0064 0.0079 -0.816 0.414
Unemployment_Rate -0.0013 0.0217 -0.061 0.952
log_road_density 0.0981 0.0267 3.668 <0.001
log_occ_housing 0.0553 0.0339 1.630 0.103
pct_builtup_area 0.1315 0.0235 5.588 <0.001
pct_young 0.0125 0.0097 1.290 0.197
pct_elderly 0.0291 0.0085 3.432 <0.001

Road accessibility and built-up intensity are the strongest drivers of education facility distribution, with better connected and more urbanised GNDs having significantly more education facilities. GNDs with higher population densities are relatively underserved on a per-capita basis. The strong spatial error parameter (λ = 0.452) indicates that unmeasured spatially structured factors account for a significant share of the clustering pattern.

Healthcare SEM

Code
sem_healthcare <- errorsarlm(
  formula_healthcare,
  data = cmb_reg,
  listw = lw_queen,
  zero.policy = TRUE
)

summary(sem_healthcare)

Call:
errorsarlm(formula = formula_healthcare, data = cmb_reg, listw = lw_queen, 
    zero.policy = TRUE)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.728531 -0.155783 -0.048381  0.057515  2.590605 

Type: error 
Coefficients: (asymptotic standard errors) 
                     Estimate Std. Error z value  Pr(>|z|)
(Intercept)         1.7755645  0.2902770  6.1168 9.548e-10
log_pop_density    -0.0729141  0.0231772 -3.1459 0.0016555
Poverty_Percentage  0.0045219  0.0061405  0.7364 0.4614889
Unemployment_Rate  -0.0197889  0.0181508 -1.0902 0.2756039
log_road_density    0.1044536  0.0246594  4.2358 2.277e-05
log_occ_housing    -0.0532194  0.0303939 -1.7510 0.0799478
pct_builtup_area    0.0695134  0.0188775  3.6824 0.0002311
pct_young          -0.0429953  0.0083979 -5.1197 3.060e-07
pct_elderly        -0.0133251  0.0071515 -1.8633 0.0624257

Lambda: 0.19021, LR test value: 7.9367, p-value: 0.0048441
Asymptotic standard error: 0.063169
    z-value: 3.0112, p-value: 0.0026023
Wald statistic: 9.0673, p-value: 0.0026023

Log likelihood: -147.2707 for error model
ML residual variance (sigma squared): 0.098756, (sigma: 0.31426)
Number of observations: 556 
Number of parameters estimated: 11 
AIC: 316.54, (AIC for lm: 322.48)
Predictor Estimate Std. Error z value p-value
(Intercept) 1.7756 0.2903 6.117 <0.001
log_pop_density -0.0729 0.0232 -3.146 0.002
Poverty_Percentage 0.0045 0.0061 0.736 0.461
Unemployment_Rate -0.0198 0.0182 -1.090 0.276
log_road_density 0.1045 0.0247 4.236 <0.001
log_occ_housing -0.0532 0.0304 -1.751 0.080
pct_builtup_area 0.0695 0.0189 3.682 <0.001
pct_young -0.0430 0.0084 -5.120 <0.001
pct_elderly -0.0133 0.0072 -1.863 0.062

Road accessibility and built-up intensity similarly drive healthcare facility distribution. However, the most striking finding is that GNDs with higher proportions of young residents are significantly underserved in healthcare, suggesting a demographic mismatch in provision. The weaker spatial error parameter (λ = 0.190) reflects that healthcare distribution is somewhat less spatially structured than education, consistent with the weaker Moran’s I found earlier.

Across both models, population density shows a consistent negative relationship, denser GNDs tend to have fewer facilities per capita, pointing to a potential equity concern in Colombo’s most densely populated areas.

4.8 Validate SEM Residuals

Code
# Morans'I test on SEM residuals to check if spatial autocorrelation is adequately accounted for
moran_sem_edu <- moran.test(residuals(sem_education), lw_queen, zero.policy = TRUE)
moran_sem_hlt <- moran.test(residuals(sem_healthcare), lw_queen, zero.policy = TRUE)

moran_sem_edu

    Moran I test under randomisation

data:  residuals(sem_education)  
weights: lw_queen    

Moran I statistic standard deviate = -0.98914, p-value = 0.8387
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
     -0.026867835      -0.001801802       0.000642183 
Code
moran_sem_hlt

    Moran I test under randomisation

data:  residuals(sem_healthcare)  
weights: lw_queen    

Moran I statistic standard deviate = -0.17518, p-value = 0.5695
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
    -0.0063130091     -0.0018018018      0.0006631888 

The Moran’s I test on SEM residuals confirms that both models have successfully removed the spatial autocorrelation present in the OLS residuals. Education SEM residuals show a Moran’s I of -0.027 (p = 0.839) and healthcare SEM residuals show -0.006 (p = 0.570), both well above the 0.05 significance threshold. The slightly negative Moran’s I values are ideal, they indicate the residuals are now pure noise with no remaining spatial pattern, confirming that the Spatial Error Models have adequately captured the underlying spatial structure in both facility distributions.

4.9 Plot Residual Map

Code
# Add SEM residual to the regression dataset for mapping
cmb_reg <- cmb_reg %>%
  mutate(
    resid_edu = residuals(sem_education),
    resid_hlt = residuals(sem_healthcare)
  )

# Plot the Map
m1 <- ggplot() +
  geom_sf(data = cmb_reg, aes(fill = resid_edu),
          color = "#d9d9d9", linewidth = 0.15) +
  geom_sf(data = st_union(cmb_reg), fill = NA,
          color = "#333333", linewidth = 0.6) +
  scale_fill_distiller(
    palette = "RdBu",
    name = "Residual",
    direction = 1,
    limits = c(-max(abs(cmb_reg$resid_edu), na.rm = TRUE), max(abs(cmb_reg$resid_edu), na.rm = TRUE))
  ) +
  labs(
    title = "Education per 1000 - SEM Residuals"
  ) +
  theme_void(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 13, hjust = 0),
    plot.subtitle = element_text(size = 10, color = "#5f5f5f", hjust = 0),
    plot.background = element_rect(fill = "white", color = NA)
  )

m2 <- ggplot() +
  geom_sf(data = cmb_reg, aes(fill = resid_hlt),
          color = "#d9d9d9", linewidth = 0.15) +
  geom_sf(data = st_union(cmb_reg), fill = NA,
          color = "#333333", linewidth = 0.6) +
  scale_fill_distiller(
    palette = "RdBu",
    name = "Residual",
    direction = 1,
    limits = c(-max(abs(cmb_reg$resid_hlt), na.rm = TRUE), max(abs(cmb_reg$resid_hlt), na.rm = TRUE))
  ) +
  labs(
    title = "Healthcare per 1000 - SEM Residuals"
  ) +
  theme_void(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 13, hjust = 0),
    plot.subtitle = element_text(size = 10, color = "#5f5f5f", hjust = 0),
    plot.background = element_rect(fill = "white", color = NA)
  )

m1 / m2 +
  plot_annotation(
    title = "Spatial Model Residual Maps, Colombo District",
    caption = "Source: OpenStreetMap & Census 2024 | CRS: UTM Zone 44N",
    theme = theme(
      plot.title   = element_text(face = "bold", size = 16, hjust = 0.5),
      plot.caption = element_text(color = "grey50", size = 9, hjust = 1)
    )
  )

Education map: Almost entirely white/near-zero residuals across the district with only a couple of isolated blue GNDs. No geographic clustering pattern is visible, residuals are randomly scattered.

Healthcare map: Similarly random mix of very light red and blue with no systematic geographic pattern. The values are small and spread without any clear directional clustering.

Both maps visually confirm what the Moran’s I validation already told us statistically, the spatial structure has been fully captured by the models.


Conclusion

Both facility types are significantly clustered in the more urbanised western part of Colombo rather than randomly distributed. The moderate positive correlation at both pixel (ρ = 0.554) and GND level (ρ = 0.323) suggests some co-location between the two facility types, but the relatively weak association indicates that education and healthcare facilities have developed independently than as a planned integrated system.

Road accessibility and built-up intensity are the strongest positive drivers of both education and healthcare facility distribution. Population density shows a consistent negative relationship in both models, meaning the most densely populated GNDs are underserved interms per capita, a clear equity concern in the Colombo district.

A notable demographic mismatch exists in healthcare, where GNDs with higher proportions of young residents have significantly fewer healthcare facilities. Poverty and unemployment were not significant in either cases, suggesting that socioeconomic deprivation alone does not determine facility provision. The significant spatial error parameters (education: λ = 0.452, healthcare: λ = 0.190) indicate that unmeasured factors are also in play that this analysis has not captured.